NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

TMModel: Modeling Texture Memory and Mobile GPU Performance to Accelerate DNN Computations

Guan, J; Hu, Z; Antonopoulos, C; Bellas, S; Smirni, E; Zhou, G; Agrawal, G; Ren, B (June 2025, ACM)

Free, publicly-accessible full text available June 10, 2026
TMModel: Modeling Texture Memory and Mobile GPU Performance to Accelerate DNN Computations

Guan, J; Hu, Z; Antonopoulus, C; Bellas, N; Lalis, S; Smirni, E; Zhou, G; Agrawal, G; Ren, B (June 2025, ACM - Proceedings of ICS 2025)

The demand for Deep Neural Network (DNN) execution (including both inference and training) on mobile system-ona-chip (SoCs) has surged, driven by factors like the need for real-time latency, privacy, and reducing vendors’ costs. Mainstream mobile GPUs (eg, Qualcomm Adreno GPUs) usually have a 2.5 D L1 texture cache that offers throughput superior to that of on-chip memory. However, to date, there is limited understanding of the performance features of such a 2.5 D cache, which limits the optimization potential. This paper introduces TMModel, a framework with three components: 1) a set of micro-benchmarks and a novel performance assessment methodology to characterize a non-well-documented architecture with 2D memory, 2) a complete analytical performance model configurable for different data access pattern (s), tiling size (s), and other GPU execution parameters for a given operator (and associated size and shape), and 3) a compilation framework incorporating this model and generating optimized code with low overhead. TMModel is validated both on a set of DNN kernels and for training complete models on mobile GPU.
more » « less
Free, publicly-accessible full text available June 9, 2026
TMModel: Modeling Texture Memory and Mobile GPU Performance to Accelerate DNN Computations

Guan, J; Hu, Z; Antonopoulus, C; Bellas, N; Lalis, S; Smirni, E; Zhou, G; Agrawal, G; Ren, B (June 2025, ACM - Proceedings of ICS 2025)

The demand for Deep Neural Network (DNN) execution (including both inference and training) on mobile system-ona-chip (SoCs) has surged, driven by factors like the need for real-time latency, privacy, and reducing vendors’ costs. Mainstream mobile GPUs (eg, Qualcomm Adreno GPUs) usually have a 2.5 D L1 texture cache that offers throughput superior to that of on-chip memory. However, to date, there is limited understanding of the performance features of such a 2.5 D cache, which limits the optimization potential. This paper introduces TMModel, a framework with three components: 1) a set of micro-benchmarks and a novel performance assessment methodology to characterize a non-well-documented architecture with 2D memory, 2) a complete analytical performance model configurable for different data access pattern (s), tiling size (s), and other GPU execution parameters for a given operator (and associated size and shape), and 3) a compilation framework incorporating this model and generating optimized code with low overhead. TMModel is validated both on a set of DNN kernels and for training complete models on mobile GPU.
more » « less
Free, publicly-accessible full text available June 9, 2026
Team-PSRO for Learning Approximate TMECor in Large Team Games via Cooperative Reinforcement Learning

McAleer, S; Farina, G; Zhou, G; Wang, M; Yang, Y; Sandholm, T (December 2023, NeurIPS)

Full Text Available
Team-PSRO for Learning Approximate TMECor in Large Team Games via Cooperative Reinforcement Learning

McAleer, S; Farina, G; Zhou, G; Wang, M; Yang, Y; Sandholm, T (December 2023, NeurIPS23)

Recent algorithms have achieved superhuman performance at a number of twoplayer zero-sum games such as poker and go. However, many real-world situations are multi-player games. Zero-sum two-team games, such as bridge and football, involve two teams where each member of the team shares the same reward with every other member of that team, and each team has the negative of the reward of the other team. A popular solution concept in this setting, called TMECor, assumes that teams can jointly correlate their strategies before play, but are not able to communicate during play. This setting is harder than two-player zerosum games because each player on a team has different information and must use their public actions to signal to other members of the team. Prior works either have game-theoretic guarantees but only work in very small games, or are able to scale to large games but do not have game-theoretic guarantees. In this paper we introduce two algorithms: Team-PSRO, an extension of PSRO from twoplayer games to team games, and Team-PSRO Mix-and-Match which improves upon Team PSRO by better using population policies. In Team-PSRO, in every iteration both teams learn a joint best response to the opponent’s meta-strategy via reinforcement learning. As the reinforcement learning joint best response approaches the optimal best response, Team-PSRO is guaranteed to converge to a TMECor. In experiments on Kuhn poker and Liar’s Dice, we show that a tabular version of Team-PSRO converges to TMECor, and a version of Team PSRO using deep cooperative reinforcement learning beats self-play reinforcement learning in the large game of Google Research Football.
more » « less
Full Text Available
Leveraging Granularity: Hierarchical Reinforcement Learning for Pedagogical Policy Induction

https://doi.org/10.1007/s40593-021-00269-9

Zhou, G.; Azizsoltani, H.; Ausin, M. S.; Barnes, T.; Chi, M. (January 2022, International journal of artificial intelligence in education)

Full Text Available
Evaluating Critical Reinforcement Learning Framework in the Field

Ju, S.; Zhou, G.; Barnes, T.; Chi, M. (January 2021, AIED)
null (Ed.)
Full Text Available
Evaluating critical reinforcement learning framework in the field

https://doi.org/10.1007/978-3-030-78292-4_18

Ju, S.; Zhou, G.; Abdelshiheed, M.; Barnes, T.; Chi, M. (January 2021, International conference on artificial intelligence in education)

Full Text Available
Capacities and efficient computation of first-passage probabilities

Loper, J; Zhou, G; Geman, S. (August 2020, Physical review)

A reversible diffusion process is initialized at position x0 and run until it first hits any of several targets. What is the probability that it terminates at a particular target? We propose a computationally efficient approach for estimating this probability, focused on those situations in which it takes a long time to hit any target. In these cases, direct simulation of the hitting probabilities becomes prohibitively expensive. On the other hand, if the timescales are sufficiently long, then the system will essentially “forget” its initial condition before it encounters a target. In these cases the hitting probabilities can be accurately approximated using only local simulations around each target, obviating the need for direct simulations. In empirical tests, we find that these local estimates can be computed in the same time it would take to compute a single direct simulation, but that they achieve an accuracy that would require thousands of direct simulation runs.
more » « less
Full Text Available
Hierarchical Reinforcement Learning for Pedagogical Policy Induction (extended abstract)

Zhou, G. (January 2020, In proceedings of the 29th International Joint Conference on Artificial Intelligence.)

Full Text Available

« Prev Next »

Search for: All records